FLUCTUATIONS OF EIGENVALUES AND SECOND 
ORDER POINCARE INEQUALITIES 
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Abstract. Linear statistics of eigenvalues in many familiar classes of 
random matrices are known to obey gaussian central limit theorems. 
The proofs of such results are usually rather difficult, involving hard 
computations specific to the model in question. In this article we at- 
tempt to formulate a unified technique for deriving such results via rel- 
atively soft arguments. In the process, we introduce a notion of 'second 
order Poincare inequalities': just as ordinary Poincare inequalities give 
variance bounds, second order Poincare inequalities give central limit 
theorems. The proof of the main result employs Stein's method of nor- 
mal approximation. A number of examples are worked out, some of 
which are new. One of the new results is a CLT for the spectrum of 
gaussian Toeplitz matrices. 



1. Introduction 

Suppose A n is an n x n matrix with real or complex entries and eigenvalues 
Ai, . . . , A n , repeated by multiplicities. A linear statistic of the eigenvalues 
of A n is a function of the form ^1=1 where / is some fixed function. 

Central limit theorems for linear statistics of eigenvalues of large dimen- 
sional random matrices have received considerable attention in recent years. 
A very curious feature that makes these results unusual and interesting is 
that they usually do not require normalization, i.e. one does not have to 
divide by \/n; only centering is enough. Moreover, they have important 
applications in statistics and other applied areas (see e.g. the recent survey 
by Johnstone [35]). 

The literature around the topic is quite large. To the best of our knowl- 
edge, the investigation of central limit theorems for linear statistics of eigen- 
values of large dimensional random matrices began with the work of Jons- 
son [36] on Wishart matrices. The key idea is to express ^ A^ as 



^Af = Tr(^)= 



H, 12, ■■;1h 
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where A n is an n x n Wishart matrix, and then apply the method of mo- 
ments to show that this is gaussian in the large n limit. In fact, Jonsson 
proves the joint convergence of the law of (Tr(A n ), Tr(j4^), . . . , Tv(An)) to a 
multivariate normal distribution (where p is fixed). 

A similar study for Wigner matrices was carried out by Sinai and Sosh- 
nikov [46j H7J. A deep and difficult aspect of the Sinai-Soshnikov results 
is that they get central limit theorems for Tr(^ n ), where p n is allowed to 
grow at the rate o(n 2//3 ), instead of remaining fixed. They also get CLTs for 
Tr(/(A0) for analytic /. 

Incidentally, for gaussian Wigner matrices, the best available results are 
due to Johansson [34J, who characterized a large (but not exhaustive) class 
of functions for which the CLT holds. In fact, Johansson proved a gen- 
eral result for linear statistics of eigenvalues of random matrices whose en- 
tries have a joint density with respect to Lebesgue measure of the form 
Z" 1 exp(— n Tr V(A)), where V is a polynomial function and Z n is the nor- 
malizing constant. These models are widely studied in the physics litera- 
ture. Johansson's proof relies on a delicate analysis of the joint density of 
the eigenvalues, which is explicitly known for this class of matrices. 

Another important contribution is the work of Diaconis and Evans |21j . 
who proved similar results for random unitary matrices. Again, the basic 
approach relies on the method of moments, but the computations require 
new ideas because of the lack of independence between the matrix entries. 
However, as shown in [20^ I21j . strikingly exact computations are possible 
in this case by invoking some deep connections between symmetric function 
theory and the unitary group. 

An alternative approach, based on Stieltjes transforms, has been devel- 
oped in Bai and Yao [5] and Bai and Silverstein [6]. This approach has its 
roots in the semi-rigorous works of Girko [24] and Khorunzhy, Khoruzhenko, 
and Pastur [38] . 

Yet another line of attack, via stochastic calculus, was initiated in the 
work of Cabanal-Duvillard |14j . The ideas were used by Guionnet [26] to 
prove central limit theorems for certain band matrix models. Far reaching 
results for a very general class of band matrix models were later obtained 
using combinatorial techniques by Anderson and Zeitouni pQ. 

Other influential ideas, sometimes at varying levels of rigor, come from 
the papers of Costin and Lebowitz [19], Boutet de Monvel, Pastur and 
Shcherbina [12], Johansson [33], Keating and Snaith [37], Hughes et. al. [30] . 
Soshnikov [48], Israelson [31] and Wieand [52] . The recent works of Ander- 
son and Zeitouni [2], Dumitriu and Edelman [22], Rider and Silverstein |44j . 
Rider and Virag [43], Jiang [32], and Hachem et. al. [281 [29] provide several 
illuminating insights and new results. The recent advances in the theory of 
second order freeness (introduced by Mingo and Speicher [41]) are also of 
great interest. 

In this paper we introduce a result (Theorem 13. ip that may provide a uni- 
fied 'soft tool' for matrices that can be easily expressed as smooth functions 
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of independent random variables. The tool is soft in the sense that we only 
need to calculate various upper and lower bounds rather than perform exact 
computations of limits as required for existing methods. (In this context, it 
should be noted that soft arguments are possible even in the combinatorial 
techniques, if one works with cumulants instead of moments, e.g. as in pQ, 
Lemma 4.10). 

We demonstrate the scope of our approach with applications to general- 
ized Wigner matrices, gaussian matrices with arbitrary correlation structure, 
gaussian Toeplitz matrices, Wishart matrices, and double Wishart matrices. 

1.1. The intuitive idea. Let us now briefly describe the main idea. Sup- 
pose X = (Xi, . . . , X n ) is a vector of independent standard gaussian random 
variables, and g : W 1 — > M is a smooth function. Let Vg denote the gradient 
of g. We know that if ||Vg(X)|| is typically small, then g{X) has small 
fluctuations. In fact, the gaussian Poincare inequality says that 

(1) Var( 5 P0)<E||V 5 (X)|| 2 . 

Thus, the size of Vg controls the variance of g(X). Based on this, con- 
sider the following speculation: Is it possible to extend the Poincare in- 
equality to the 'second order', as a method of determining whether g(X) 
is approximately gaussian by inspecting the behavior of the second order 
derivatives of gl 

The speculation turns out to be correct (and useful for random matrices), 
although in a rather mysterious way. The following example is representative 
of a general phenomenon. 

Suppose B is a fixed n x n real symmetric matrix, and the function g : 
W 1 -> R is defined as 

g(x) = x l Bx, 

where x l denotes the transpose of the vector x. Let X = (X\, . . . ,X n ) be 
a vector of independent standard gaussian random variables, and let us ask 
the question "When is g{X) approximately gaussian?" . 

Now, if Ai, A2, • • • , A n are the eigenvalues of B with corresponding eigen- 
vectors u\, U2, ■ ■ ■ , u n , then 

n 

g(X) = Y,^Y?, 

i=l 

where Y{ = u\X. Since we can assume without loss of generality that 
ui, . . . ,u n are mutually orthogonal, therefore Y±, . . . , Y n are again i.i.d. stan- 
dard gaussian. This seems to suggest that g(X) is approximately gaussian 
if and only if 'no eigenvalue dominates in the sum'. In fact, one can show 
that g(X) is approximately gaussian if and only if 

max |Aj| 2 <C Af . 

i — ' 

i 



4 



SOURAV CHATTERJEE 



Now V 2 g(x) = 2B, where V 2 g denotes the Hessian matrix of g. Thus, the 
question about the gaussianity of g(X) can be reduced to a question about 
the negligibility of the operator norm squared of V 2 g(X) (= 2max|Aj| 2 ) in 
comparison to the variance of g(X) (=2^2 X 2 ). 

In Theorem 12.21 we generalize this notion to show that for any smooth 
g, g(X) is approximately gaussian whenever the typical size of the operator 
norm squared ofS7 2 g(X) is small compared to Var(g(X)), and a few other 
conditions are satisfied. An outline of the rigorous proof is given in the next 
subsection. 

The idea is applied to random matrices as follows. We consider random 
matrices that can be easily expressed as functions of independent random 
variables, and think of the linear statistics of eigenvalues as functions of 
these independent variables. The setup can be pictorially represented as 

large vector X — ► matrix A(X) — > linear statistic Ylifi^i) =: ffO^O* 

The main challenge is to evaluate the second order partial derivatives of g. 
However, our task is simplified (and the argument is 'soft') because we only 
need bounds and not exact computations. Still, a considerable amount of 
bookkeeping is involved. We provide a 'finished product' in Theorem 13. II for 
the convenience of potential future users of the method. 

A discrete version of this idea is investigated in the author's earlier pa- 
per [16]. However, no familiarity with [16] is required here. 

1.2. Outline of the proof via Stein's method. The argument for gen- 
eral g is not as intuitive as for quadratic forms. It begins with Stein's 
method @9j [50]: If a random variable W satisfies R(tp(W)W) ~ E((p'(W)) 
for a large class of functions ip, then W is approximately standard gaussian. 
The idea stems from the fact that if W is exactly standard gaussian, then 
E((/?(V1 / )V1 / ) = E(ip'(W)) for all absolutely continuous ip for which both sides 
are well defined. Stein's lemma (Lemma l5.1l in this paper) makes this precise 
with error bounds. 

Now suppose we are given a random variable W, and there is a function 
h such that for all a.c. cp, 

(2) E(tp(W)W) = E(ip'(W)h(W)). 

For example, if W has a density p with respect to Lebesgue measure, and 
E(W) = 0, E(W 2 ) = 1, then the function 

h{x) = L°°yp(y)dy 

p[x) 

serves the purpose. Now if h(W) ~ 1 in a probabilistic sense, then we can 
conclude that 

E((p(W)W) & E(tp'(W)), 

and it would follow by Stein's method that W is approximately standard 
gaussian. This idea already occurs in the literature on normal approxima- 
tion [15] . However, it is not at all clear how one can infer facts about h{W) 
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when W is an immensely complex object like a linear statistic of eigenval- 
ues of a Wigner matrix. One of the main contributions of this paper is 
an explicit formula for h(W) when W can be expressed as a differentiable 
function of a collection of independent gaussian random variables. 

Lemma 1.1. Suppose X = {X\, . . . , X n ) is a vector of independent standard 
gaussian random variables, and g : W l — > M. is an absolutely continuous 
function. Let W = g{X), and suppose that E(W) = and E{W 2 ) = 1. 
Suppose h is a function satisfying ([2]) for all Lipschitz (p. Then h(W) = 
E(T(X)\W), where 



Barring the technical details, the proof of this lemma is surprisingly simple. 
To establish ([2]), we only have to show that for all Lipschitz tp, 



This is achieved via gaussian interpolation. Let X' be an independent copy 



of X, and let W t = g{VtX + y/T^tX'). Since E(W) = 0, we have 



Integration by parts on the right hand side gives the desired result. The de- 
tails of the proof are contained in the proof of the more elaborate Lemma [5 .31 
in Section [5j 

Since E(W 2 ) = 1, taking ip(x) = x it follows that E(h(W)) = 1. Com- 
bining this with the fact that Vax(h(W)) < Var(T(X)), we see that we only 
have to bound Var(T(X)) to show that W is approximately gaussian. Now, 
if g is a complicated function, T is even more complicated. Hence, we can- 
not expect to evaluate Var(T(X)). On the other hand, we can always use 
the gaussian Poincare inequality (pQ) to compute a bound on Var(T(X)). 
This involves working with VT. Since T already involves the first order 
derivatives of g, VT brings the second order derivatives into the picture. 
This is how we relate the smallness of the Hessian of g to the approximate 
gaussianity of g(X), leading to Theorem 12.21 in the next section. 

We should mention here that a problem with Lemma 11.11 is that we have 
to know how to center and scale W so that E(W) = and E(W 2 ) = 1. This 
may not be easy in practice. 

It is also worth noting that Lemma ll.ll can, in fact, be used to prove the 
gaussian Poincare inequality ([T]) — just by taking tp(x) = x and applying 
the Cauchy-Schwarz inequality to bound the terms inside the integral in 
the expression for E(T). In this sense, one can view Lemma ll.ll as a gen- 
eralization of the gaussian Poincare inequality. Incidentally, the first proof 




E(tp(W)W) =E(ip'(W)T(X)). 
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of the gaussian Poincare inequality in the probability literature is due to 
H. Chernoff [18] who used Hermite polynomial expansions. However, such 
inequalities have been known to analysts for a long time under the name of 
'Hardy inequalities with weights' (see e.g. Muckenhoupt [42J). 

We should also mention two other concepts from the existing literature 
that may be related to this work. The first is the notion of the 'zerobias 
transform' of W, as defined by Goldstein and Reinert [25J. A random vari- 
able W* is called a zerobias transform of W if for all <p, we have 

E(<p(W)W) =E(<p'(W*)). 

A little consideration shows that our function h is just the density of the 
law of W* with respect to the law of W when the laws are absolutely con- 
tinuous with respect to each other. However, while it is quite difficult to 
construct zerobias transforms (not known at present for linear statistics of 
eigenvalues), Lemma 1 1 . 1 1 gives a direct formula for h. 

The second related idea is the work of Borovkov and Utev [TO] which says 
that if a random variable W with E(W) = and E(W 2 ) = 1 satisfies a 
Poincare inequality with Poincare constant close to 1, then W is approxi- 
mately standard gaussian (if the Poincare constant is exactly 1, the W is 
exactly standard gaussian). As shown by Chen [17], this fact can be used 
to prove central limit theorems in ways that are closely related to Stein's 
method. Although it seems plausible, we could not detect any apparent 
relationship between this concept and our method of extending Poincare 
inequalities to the second order. 

2. Second order Poincare inequalities 

All our results are for functions of random variables belonging to the 
following class of distributions. 

Definition 2.1. For each ci,C2 > 0, let £(ci,C2) be the class of probability 
measures on R that arise as laws random variables like u{Z), where Z is a 
standard gaussian r.v. and u is a twice continuously differentiable function 
such that for all x G R 

|u'(x)| < ci and \u"(x)\ < cq,. 

For example, the standard gaussian law is in £(1,0). Again, taking u = the 
gaussian cumulative distribution function, we see that the uniform distribu- 
tion on the unit interval is in £((27r) -1 / 2 , (2-7re) -1 / 2 ). For simplicity, we just 
say that a random variable X is "in £(ci, C2)" instead of the more elaborate 
statement that "the distribution of X belongs to £(01,02)". 

Recall that for any two random variables X and Y, the supremum of 
\P(X 6 B) — P(y G B)\ as B ranges over all Borel sets is called the total 
variation distance between the laws of X and Y, often denoted simply by 
dxv (X, Y). Note that the total variation distance remains unchanged under 
any transformation like (X, Y) — > (f(X),f(Y)) where / is a measurable 
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bijective map. Next, recall that the operator norm of an m x n real or 
complex matrix A is defined as 

\\A\\ : = sup{||Ac|| : x £ C n , \\x\\ = 1}. 

Recall that ||j4|| 2 is the largest eigenvalue of A* A. If A is a hermitian matrix, 
||^4|| is just the spectral radius (i.e. the eigenvalue with the largest absolute 
value) of A. This is the default norm for matrices in this paper, although 
occasionally we use the Hilbert-Schmidt norm 



\ a \\hs ■= Ek/) 



211/2 



The following theorem gives normal approximation bounds for general smooth 
functions of independent random variables whose laws are in L{c\,C2) for 
some finite c\ , c 2 . 

Theorem 2.2. Let X = (X±, . . . ,X n ) be a vector of independent random 
variables in £(01,02) for some finite c\,C2- Take any g G C 2 {M. n ) and let 
Vg and V 2 g denote the gradient and Hessian of g. Let 



4 \ !/ 2 



i=l 

kxHEHV^X)!! 4 ) 1 / 4 , and 
K2 = (E||V 2 5 (^)H 4 ) 1/4 - 

Suppose W = g{X) has a finite fourth moment and let a 2 = Yax{W). Let 
Z be a normal random variable having the same mean and variance as W . 
Then 

2a/5(cic 2 ko + c\kik 2 ) 



d TV {W,Z) < 



a 2 



If we slightly change the setup by assuming that X is a gaussian random 
vector with mean and covariance matrix S, keeping all other notation the 
same, then the corresponding bound is 

Note that when X\, . . . , X n are gaussian, we have c 2 = 0, and the first bound 
becomes simpler. For an elementary illustrative application of Theorem l2.2l 
consider the function 

^ n— 1 

g(x) = —= y^XiX i+1 . 



i=l 

Then 

dg _ Xj_i + x i+ i 
dxi x/n 
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with the convention that xq = x n+ \ = 0. Again, 

d 2 g _ Jl/y 7 ^ if|i-j| = l, 
dxidxj 1 otherwise. 

It follows that kq = 0{l/y/n), k\ = 0(1), and K2 = 0(\/y/n), which gives 
a total variation error bound of order l/^Jn. Note that the usual way to 
prove a CLT for n^ 1 / 2 Yl7=i XiX%+\ is via martingale arguments, but total 
variation bounds are not trivial to obtain along that route. 

Remarks, (i) Theorem 12.21 can be viewed as a second order analog of the 
gaussian Poincare inequality ((!]). While the Poincare inequality implies that 
g(X) is concentrated whenever the individual coordinates have small 'influ- 
ence' on the outcome, Theorem 12.21 says that if in addition, the 'interaction' 
between the coordinates is small, then g(X) has gaussian behavior. The 
magnitude of ||V 2 <7(X)|| is a measure of this interaction. 

(ii) The smallness of ||V 2 <7(A)|| does not seem to imply that g(X) has any 
special structure, at least from what the author understands. In particular, 
it does not imply that g(X) breaks up as an approximately additive function 
as in Hajek projections [511 [23] . It is quite mysterious, at the present level 
of understanding, as to what causes the gaussianity. 

(iii) A problem with Theorem 12.21 is that it does not say anything about 
a 2 . However, in practice, we only need to know a lower bound on a 2 to 
use Theorem 12.21 for proving a CLT. Sometimes this may be a lot easier to 
achieve than computing the exact limiting value of a 2 . This is demonstrated 
in some of our examples in Section [H 

(iv) One may wonder why we work with random variables in £j(c\,c<i) 
instead of just gaussian random variables. Indeed, the main purpose of 
this limited generality is simply to pre-empt the question 'Does your result 
extend to the non- gaussian case?'. However, it is more serious than that: 
The true rate of convergence may actually differ significantly depending 
on whether X is gaussian or not, as demonstrated in the case of Wigner 
matrices in Section HI 

(v) There is a substantial body of literature on central limit theorems for 
general functions of independent random variables. Some examples of avail- 
able techniques are: (a) the classical method of moments, (b) the martingale 
approach and Skorokhod embeddings, (c) the method of Hajek projections 
and some sophisticated extensions (e.g. [51], [45], [23]), (d) Stein's method 
of normal approximation (e.g. [49], [50], [25]), and (e) the big-blocks-small- 
blocks technique and its modern multidimensional versions (e.g. [9], [3]). For 
further references — particularly on Stein's method, which is a cornerstone 
of our approach — we refer to [16]. Apart from the method of moments, 
none of the other techniques have been used for dealing with random matrix 
problems. 
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3. The random matrix result 



Let n be a fixed positive integer and J be a finite indexing set. Suppose 
that for each 1 < i, j < n, we have a C 2 map a^- : M 3 — ► C. For each x e M J , 
let A[x) be the complex n x n matrix whose (i, j) th element is aij(x). Let 



/(*) 



oo 

?Tt=0 



be an analytic function on the complex plane. Let X = (X u ) uG j be a col- 
lection of independent random variables in L(c\,c 2 ) for some finite ci,c 2 . 
Under this very general setup, we give an explicit bound on the total vari- 
ation distance between the laws of ReTr/( J 4(X)) and a gaussian random 
variable with matching mean and variance (here as usual, Re z and Im z 
denote the real and imaginary parts of a complex number z). 

As mentioned before, the method involves some bookkeeping, partly due 
to the quest for generality. The algorithm requires the user to compute a 
few quantities associated with the matrix model, step by step as described 
below. First, let 



(3) 



8 = {0 e C 



a u \ 2 = 1} and 



Next, define three functions 70, 71 and 72 on 

OA 



as follows. 



7o0) := 



sup 

i*ea,||B||=i 



Tr B 



(4) 



71 (x) := sup 



n 

EE 

ugj i,j=l 



a da ij 



dx„ 



sup 



72 (x) : 

Define two entire functions f\ and /2 as 



^2 ^2 a u a 'vft 

u,ve3i,j=l 



and 

d 2 a 



13 



X) 



dx n dx d 



h(z) = m \bm\z m 1 and f 2 (z) = ^ m(m - l)\b n 



jm-2 



171=1 



m=2 



Let X(x) = \\A(x)\\ and r(x) = rank(y4(x)). Usually, of course, we will just 
have r(x) = n. Next, define three more functions 

Vo(x) = 7o(x)/i(A(x)), 
r}i{x)=7i(x)fi(\(x))^/r{x), and 



mix) = 72 (x)/i(A(x))7r(x) + 7 i(x) 2 / 2 (A(x)). 
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Finally, define three quantities Kq, k±, and k 2 as 



k = (e^POVPO 2 )) 172 , 

ki = (Et^X) 4 ) 1 / 4 , and 
K2 = (E^X) 4 ) 1 / 4 . 



Let us pacify the possibly disturbed reader with the assurance that we only 
need bounds on kq, k%, and k 2 , as oppposed to exact computations. This 
turns out to be particularly easy to achieve in all our examples. We are now 
ready to state the theorem. 

Theorem 3.1. Let all notation be as above. Suppose W = ~ReTr f(A(X)) 
has finite fourth moment and let a" = V&r(W). Let Z be a normal random 
variable with the same mean and variance as W . Then 



If we slightly change the setup by assuming that X is a gaussian random 
vector with mean and covariance matrix S, keeping all other notation the 
same, then the corresponding bound is 



Remarks, (i) A problem with Theorem 13. II is that it does not give a formula 
or approximation for a 2 . However, central limit theorems can still be proven 
if we can only compute suitable lower bounds for a 2 . In Section U we show 
that this is eminently possible in a variety of situations (e.g. Theorems 14.21 
and 14. 5|). 

(ii) Although the result is stated for entire functions, the concrete error 
bound, combined with appropriate concentration inequalities, should make 
it possible to prove limit theorems for general C 1 functions wherever re- 
quired. 

(iii) Note that the matrices need not be hermitian, and the random vari- 
ables need not be symmetric around zero. However, it is a significant re- 
striction that the Xy's have to belong to L(c\,c 2 ) for some finite c\,c 2 . In 
particular, they cannot be discrete. 

(iv) By considering af instead of / for arbitrary a £ C, we see that the 
normal approximation error bound can be computed for any linear combi- 
nation of the real and imaginary parts of the trace. This allows us to prove 
central limit theorems for the complex statistic Tr f(A) via Wold's device. 

(v) It is somewhat surprising that such a general result can give useful 
error bounds for familiar random matrix models. Unfortunately, the case of 
random unitary and orthogonal matrices seems to be harder because of the 
complexity in expressing them as functions of independent random variables. 
This is under the scope of a future project. 



d TV {W,Z) < 



d TV {W,Z) < 



2^5||S|| 3 / 2 
a 2 



KxK-2 
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4. Applications 

This section is devoted to working out a number of applications of The- 
orem 12.21 In all cases, we produce a total variation error bound where the 
variance of the linear statistic, a 2 , appears as an unknown quantity. In some 
of the examples (e.g. Wigner and Wishart matrices), the limiting value of a 2 
is known from the literature. In other cases, they are yet unknown, and the 
central limit theorems are proven modulo this lack of knowledge about a 2 . 

The following simple lemma turns out to be very useful for bounding 70, 
7x, and 72 in the examples. Recall the definitions of the operator norm and 
the Hilbert-Schmidt norm of matrices from Section [2j 

Lemma 4.1. Suppose A±, . . . ,A n (n > 3) are real or complex matrices of 
dimensions such that the product A\A 2 ■ ■ ■ A n is defined. Then 

(5) \\AiA 2 \\hs < mm.{\\Ai\\\\A 2 \\HS, \\M\hs\\A 2 \\}. 
Moreover, for any 1 < i < j < n, 

iTr^Aa---^)! < IIAHhsIIAjIIhs JJ \\A k \\. 

ke[n]\{i,j} 

Proof. Let b±, . . . ,b n be the columns of A 2 . Then 

n 

WA^fjjs = r Yr{A^A\A 1 A 2 ) = ]T HA^ 2 

i=i 

n 

< l^fJ^lM^WAifWA^Hs. 

i=i 

Similarly, we have HA1A2H < ||Ai||#s||A2||. For the other inequality, note 
that a simple application of the Cauchy-Schwarz inequality shows that 

|Tr(A 1 A 2 ---A n )| < Px-.-AH 

Now by the inequality ([5]), 

\\Ax---A4hs < ||Ai---Ai_i||||Ai|| H s. 

Similarly, 

||Aj+i • • • A n \\HS < l|Aj+i • • • Aj_i||||Ajf • • • A n ||^s 

< \\Ai+i ■ ■ ■ Aj-i\\\\Aj\\Hs\\Aj+i ■ ■ ■ A n \\. 

This completes the proof. □ 

4.1. Generalized Wigner matrices. Suppose X = (^ij)i<i<j<n is a col- 
lection of independent random variables. Let Xij = Xji for i > j and let 

(6) A n = A n (X) := —/=(Xij)i<ij< n . 

y n 

A matrix like A n is called a Wigner matrix. Central limit theorems for linear 
statistics of eigenvalues of Wigner matrices have been extensively studied 
in the literature (see e.g. [46l HTJ SU [l] ) . While the case of gaussian entries 
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can be dealt with using analytical techniques [34J, the general case requires 
heavy combinatorics. To give a flavor of the results in the literature, let us 
state one key theorem from [47] (although, technically, it is not a CLT for a 
fixed linear statistic). 

Theorem. (Sinai and Soshnikov [47], Theorem 2) Let Xij and A n be as 
above. Suppose that the Xij 's have symmetric distributions around zero, 
E(X?-) = 1/4 for all and there exists a constant K such that for every 
positive integer m and all i,j, ¥i(Xfj n ) < (Km) m . Let p n — > oo as n — > oo 
such that p n = o(n 2 / 3 ). Then 



E(Try4£" 



2 3 ^ 2 n(irp n ) 1 / 2 (l + o(l)) if p n is even, 
if Pn is odd, 



and the distribution of Tr An" — E(Tr An") converges weakly to the normal 
law iV(0,l/vr). 

As remarked in [47] and demonstrated in [36], the normal approximation 
result can be extended to the joint distribution of the traces of various 
powers, and then to general analytic functions. 

We wish to extend the above result to the scenario where K(Xfj) is not 
the same for all A wide generalization of this problem has been re- 
cently investigated by Anderson and Zeitouni [I] under the assumption that 
¥,(Xfj) ~ /(-, ^) where / is a continuous function on [0, l] 2 . Under further 
assumptions, explicit formulas for the limiting means and variances are also 
obtained in [T]. 

If the structural assumptions are dropped and we just assume that E(X 2 ) 
is bounded above and below by positive constants, then there does not 
seem to be much hope of getting limiting formulas. Surprisingly, however, 
Theorem 13. II still allows us to prove central limit theorems. 

Theorem 4.2. Let A n be the Wigner matrix defined in ([6]). Suppose that 
the X^ 's are all in £(ci, cq) for some finite C\,C2, and have symmetric distri- 
butions around zero. Suppose there are two positive constants c and C such 
that c < K(Xfj) < C for all Let p n be a sequence of positive integers 
such that p n = o(logra). Let W n = Tv(A p n n ). Then as n — > oo, 

Wn_ converges in total variation to iV(0,l). 

Moreover, Var(W„) stays bounded away from zero. The same results are 
true also if W n = Trf(A n ), where f is a fixed nonzero polynomial with 
nonnegative coefficients. 

Note that the rate of growth allowed for p n is o(logn), which is significantly 
worse than the Sinai-Soshnikov condition p n = o(n 2 / 3 ). We do not know how 
to improve that at present. Neither do we know how to produce asymptotic 
formulas for E(W / n ) and Var(W n ) as in Anderson and Zeitouni [lj. On the 
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positive side, the assumption that c < E(AT?-) < C is more general than 
any available result, as far as we know. In particular, we do not require 
asymptotic 'continuity' of E(X?-) in (i,j). The proof of Theorem 14.21 will 
follow from the following finite sample error bound. 

Lemma 4.3. Fix n. Let A = A{X) be the Wigner matrix defined in ([6]). 
Suppose the X^j's are in L(c\,C2) for some finite ci,C2- Take an entire 
function f and define f\, $2 as in Theorem 13.11 Let A denote the spectral 
radius of A. Let a = (E/^A) 4 ) 1 ^ an d b = (E/ 2 (A) 4 ) 1 / 4 . Suppose W = 
ReTr/(A) has finite fourth moment and let a 2 = V&v(W). Let Z be a 
normal random variable with the same mean and variance as W . Then 



Remarks, (i) It is well known that under mild conditions, A converges to 
a finite limit as n — > oo (see e.g. [4], Section 2.2.1). Even exponentially 
decaying tail bounds are available [27] . Thus a and b are generally 0(1) in 
the above bound. 

(ii) Sinai and Soshnikov (|46j, Corollary 1) showed that a 2 converges to a 
finite limit under certain conditions on / and the distribution of the Xij's. If 
these conditions are satisfied and the limit is nonzero, then we get a bound 
of order 1/y/n. Moreover, for gaussian Wigner matrices we have c 2 = 
and hence a bound of order 1/n. The difference between the gaussian and 
non-gaussian cases is not an accident. With f(z) = z, we have 



In this case we know that the error bound in the non-gaussian case is exactly 
of order \/^/n. 

Before proving Lemma l4,31 let us first prove Theorem Housing the lemma. 
The main difference between Lemma 14.31 and Theorem 14.21 is that the as- 
sumption of symmetry on the distributions of the entries allows us to com- 
pute a lower bound on the unknown quantity a 2 and actually prove a CLT 
in Theorem 14.21 





Proof of TheoremWM Let g?. = E(X? ■•), and let 
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Let E n denote the matrix -^{dj)i<i,j<n- Now take any collections of non- 
negative integers {uij)i<i<j<n and (Aj)l<i<i<n- Tnen 

(, '-(ii- v ;mi^; 
ik-" '••) (n E (C +ftj ) -n E (0 E (^))' 

where the products are taken over 1 < i < j < n. Now note that if ay + /3{j 
is odd, then E(£j J+ftj ) = E(^)E(^) = 0. If ^ and fa are both odd, 

then E(^ +ftj ) > and E(^ J ) = E(^) = 0. Finally, if ay and /% are 
both even, then 



Thus, under all circumstances, we have 

(7) E(^' +ftj ) > E(^ )E(^ ) > 0. 

Therefore, 



From this, it follows easily that for any positive integer p n , 
Var(TrvlP") > c Pn Var(Tr H^ n ). 

Now, by ©, 

Var(Tr Bg») > -I- £ Var& li2 ^ 3 • • • ^ij- 

l<«i,...,i Pn <n 

If ii, . . . , i Pn are distinct numbers, then 

Varte l42 e i2i3 • • • & Pn n) = H&bM&iJ ' " " Cu) = L 

Thus, 

Var(Tr^) > ^ - 1) ■ ■ ■ (n - Pn + 1) ^ 

and so, if p n is a sequence of integers such that p n = o(n 1/,2 ) ; then 

(8) Var(Tr )> Kc Pn , 

where K is a positive constant that does not vary with n. 

Now note that for any nonnegative integer atij, E(£°^ J ) > 0. Thus, 

In particular, for any positive integer Z, 

E(Ttt1 z ) < C* /2 E(TrH^). 
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Let A n denote the spectral radius of A n . Then for any positive integer m 
and any positive even integer I, 

E(A™) < (E(Tr A l ™)) 1/1 < C m / 2 (E(TrH^ m )) lA . 

Now let I = l n := 2[logn]. If m n is a sequence of positive integers such that 
m n = o(n 2 ' 3 / log n), it follows from the Sinai-Soshnikov result stated above 
that for all n, 

(E(Tr H' nm ™)) < K'2 mn n}l ln < K2 nin 

where K' and K are constants that do not depend on n. Note that we could 
apply the theorem because £y's are symmetric and E(^|? ?1 ) < (Km) m for all 
m due to the L{c\,C2) assumption. The 2 m ™ term arises because E(£?-) = 1 
instead of 1/4 as required in the Sinai-Soshnikov theorem. Combined with 
the previous step, this gives 

(9) E(A™ n ) < K(4C) m " /2 . 

Now let us apply Lemma 14.31 to W = TrAfJ™. First, let us fix n. We have 
f(x) = x Pn , and hence f\{x) = pnX^ -1 and f2(x) = p n (p n — l)x Pn ~ 2 . It 
follows that both a 2 and ab are bounded by p 2 (E(An Pn )) 1 ^ 2 , which according 
to ([9]), is bounded by iTp 2 (4C) Pn . On the other hand, by ([8]), a 2 is lower 
bounded by Kc p ". Combining, and using Lemma 14.31 we get 



n 

where K is a constant depending only on c, C7, c\ and C2, and Z is a gaussian 
random variable with the same mean and variance as W . If p n = o(logn), 
the bound goes to zero. 

When W n = Tif(A n ), where / is a fixed polynomial with nonnegative 
coefficients, the proof goes through almost verbatim, and is in fact simpler. 
The nonnegativity of the coefficients is required to ensure that all monomial 
terms are positively correlated, so that we can get a lower bound on the 
variance. □ 

Proof of Lemma 14.31 Let J = : 1 < i < j < n}. Let x = {xij)i<i<j< n 

denote a typical element of M? . For each such x, let A(x) = (ciij(x))i<i.j< n 
denote the matrix whose (i, j) th element is n^ 1 ^ 2 Xij if i < j and n _1//2 Xjj if 
i > j. Then the matrix A considered above is simply A(X), and this puts 
us in the setting of Theorem 13.11 Now, 

dajj = in- 1 / 2 if = (k,l) or = (l,k), 
dxki 1 otherwise. 

Therefore, for any matrix B with = 1, and 1 < k ^ I < n, 

2 







bki + hk 


< 











n 
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It is clear that the same bound holds even if k = I. Thus, 

2 

70 (x) < —= for all x G M?. 



Next, let 31 and S be as in ([3]), and take any a G 3£, [3 G §. Then by the 
Cauchy-Schwarz inequality, we have 

^ C^(2 1 2 

{k,i)&i,j=i v (fc.Qea 



Thus, 

2 

7l(x) 5; - f= f° r a h x G 



Now, it is clear that 72 (x) = and r(x) < n. Thus, if we define rjo, J?i, and 
772 as in Theorem 13. 1| and let X(x) be the spectral radius of A(x), then for 
all 1 £ K 3 we have 

2/!(A(x)) 
r? (x) < -= — , 

V n 

r]i{x) < 2/i(A(x)), and 
4/ 2 (A(x)) 

V2{X) < • 

n 

This gives 

4(E/ 1 (A) 4 ) 1 / 2 ^ Wfm 4 W 4 H , 4(E/ 2 (A) 4 ) 1 / 4 
«o < 7= , «i < 2(E/i(Aj J 1 , and K2 < 



'n n 
Plugging these values into Theorem 13, 1\ we get the result. □ 

4.2. Gaussian matrices with correlated entries. Suppose we have a 
collection X = (-Xy)i<j j< n of jointly gaussian random variables with mean 
zero and n 2 x n 2 covariance matrix S. Let A = n~ l l 2 {Xij)\<i t j< n . Note 
that A may be non-symmetric. Limiting behavior of the spectrum in such 
matrices have been recently investigated by Anderson and Zeitouni [2] under 
special structures on E. We have the following general result. 

Proposition 4.4. Take an entire function f and define f\, fi as in The- 
orem 13 .11 Let A denote the operator norm of A. Let a = (EZi(A) 4 ) 1 / 4 and 
b = (E/ 2 (A) 4 ) 1 / 4 . Suppose W = KeTi f(A) has finite fourth moment and 
let a 2 = Var(W). Let Z be a normal random variable with the same mean 
and variance as W . Then 

dTv{w , Z)< vW!^. 

<r z n 

Proof. The computations of k\ and k<i are exactly the same as for Wigner 
matrices. The only difference is that we now apply the second part of The- 
orem 13.11 □ 
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Of course, the limiting behavior of a 2 is not known, so this does not prove 
a central limit theorem as long as such results are not established. The term 
||$]|| 3//2 can often be handled by the well-known Gershgorin bound for the 
operator norm: 

n 

IISII < max > Ian ki\, 

Ki,j<n ' J ' 
~ u ~ k,l=l 

where cr^i = Cov(Ajj, X^i). The next example gives a concrete application 
of the above result. 

4.3. Gaussian Toeplitz matrices. Fix a number n and let Xq, . . . , X n ^\ 
be independent standard gaussian random variables. Let A n be the matrix 

A n := re -1 / 2 (X|j_j|)i<jj< n . 

This is a gaussian Toeplitz matrix, of the kind recently considered in Bryc, 
Dembo, and Jiang [13] and also in M. Meckes [40] and Bose and Sen [llj . 
Although Toeplitz determinants have been extensively studied (see e.g. Ba- 
sor [7] and references therein), to the best of our knowledge, there are no 
existing central limit theorems for general linear statistics of eigenvalues of 
random Toeplitz matrices. We have the following result. 

Theorem 4.5. Consider the gaussian Toeplitz matrices defined above. Let 
p n be a sequence of positive integers such that p n = o(logn/loglogn). Let 
W n = Ti(Al n ). Then, asn^oo, 

— - z T ^=- converges in total variation to iV(0, 1). 
yVar(W„) 

Moreover, there exists a positive constant C such thatV&r(W n ) > (C/p n ) Pn n 
for all n. The central limit theorem also holds for W n = Tr/(^4 n ) ; when f 
is a fixed nonzero polynomial with nonnegative coefficients. In that case, 
Var(M / n ) > Cn for some positive constant C depending on f . 

Remarks, (i) Note that the theorem is only for gaussian Toeplitz matrices. 
In fact, considering the function f(x) = x, we see that a CLT need not 
always hold for linear statistics of non-gaussian Toeplitz matrices. 

(ii) This is an example of a matrix ensemble where nothing is known 
about the limiting formula for Var(W n ). Theorem 13.11 enables us to prove 
the CLT even without knowing the limit of Var(W n ). As before, this is 
possible because we can easily get lower bounds on Var(W n ). 

Proof. In the notation of the previous subsection, 

I 1 if \i — j\ = \k — 1\, 



0~ij,kl 

Thus, 



otherwise. 



< max) |°ijfcz| < 2n. 
i,j — j ' 

k,l 
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Let A n denote the spectral norm of A n . Using Proposition 14.41 and the above 
bound on ||E||, we have 

(10) d TV (W n , Z n ) < , 

where Z n is a gaussian random variable with the same mean and variance 
as W n and C is a universal constant. 

In the rest of the argument we will write p instead of p n to ease notation. 
First, note that 

W n = Tr(AP) = n~v/ 2 £ X\ n _ l2 \X\ l2 _ H ■ ■ ■ X {tp _ ll{ . 

l<ii ,...,i p <n 

As in the proof of Theorem l4,21 it is easy to verify that all terms in the above 
sum are positively correlated with each other, and hence, for any partition 
D of the set {1, . . . , n} p into disjoint subcollections, 

(11) Var(TU„) > n-P £ Var ( £ X^X^ • • • X ]ip _ h ] . 

DeD ^(n,...,jp)6£) ' 

For any collection of distinct positive integers 1 < ax, ■ ■ ■ ,a p -i < \n/3p], 
let D air „ >ap _ 1 be the set of all 1 < i%, . . . , i p < n such that ik+i — ik = ak for 
k = l,...,p-l and 1 < ix < fn/3]. Clearly, |-D ai ,...,a p _il = \n/3\. Again, 
since the aj's are distinct, 



Var ( X \h-i2\ X \i2-i S \--- X \i P -ii\) 



(h,---,ip)&Da 1 ,...,a p _ 1 

2 2 
= \D ai ,...,a p -i | Xai(X ai X ap _ 1 X ai ^ ha p _i) = \D ai ,...,a p ^ 1 | > — • 

Next, note that the number of ways to choose ax, ■ ■ ■ ,a p -i satisfying the 
restrictions is 

\n/3p] ( \n/3p] - 1) • • • ( \n/3p] -p + 2). 

Since we can assume, without loss of generality, that n > 4p 2 , the above 
quantity can be easily seen to be lower bounded by (rt/12p) p . Finally, 
noting that if (ax, . . . , a p -x) ^(a[,..., a^.i), then D 0l ,...,a p _i and A^,...,^ 
are disjoint, and applying (fTTI) . we get 



where C is a positive universal constant. 

Next, let X n denote the spectral norm of A n . By Theorem 1 of M. Meckes 
[4"0] . we know that E(A n ) < CVlog n. Now, it is easy to verify that the map 
(Xq, . . . , X n _i) i ^ A n has Lipschitz constant bounded irrespective of n. By 
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standard gaussian concentration results (e.g. Ledoux [39], Sections 5.1-5.2), 
it follows that for any k, 

E|A n -E(A n )| fc < C k/2 k k ' 2 , 

where, again, C is a universal constant. Combining with result for E(A n ), it 
follows that for any n and k, 

E(X k )<(Cklogn) k / 2 . 

Thus, the term p 2 (EX^) 1 ^ 2 in (JTOj) is bounded by p 2 (Cp\ogn) p . Therefore, 
from {10]) and £[2]), it follows that 

CPp 2 P+ 2 (logny 
d T v(W n ,Z n ) < -= , 

where C is a universal constant. Clearly, if p = o(log nj log log n), this goes 
to zero. This completes the proof for W n = Ti(Al n ). When W n = Tvf(A n ), 
where / is a fixed polynomial with nonnegative coefficients, the proof goes 
through exactly as above. If f(x) = cq + • • • + CkX k , the nonnegativity of 
the coefficients ensures that Var(W n ) > c^Var(TrA^), and we can re-use 
the bounds computed before to show that Var(W n ) > C(f)n. The rest is 
similar. □ 

4.4. Wishart matrices. Let n < N be two positive integers, and let 
X = (^ij)i<i<n,i<j<A r be a collection of independent random variables in 
£(ci,C2) for some finite c±,C2- Let 

A = N^XXK 

In statistical parlance, the matrix A is called the Wishart matrix or sam- 
ple covariance matrix corresponding to the data matrix X. Just as in the 
Wigner case, linear statistics of eigenvalues of Wishart matrices also sat- 
isfy unnormalized central limit theorems under certain conditions. This was 
proved for polynomial / by Jonsson [36j, and for a much larger class of 
functions in Bai and Silverstein [6]. A different proof was recently given by 
Anderson and Zeitouni pp. We have the following error bound. 

Proposition 4.6. Let A be the largest eigenvalue of A. Take any entire 
function f and define f\, /2 as in Theorem^ Let a = (E(/i(A) 4 A 2 )) 1 / 4 
and b = (E(/i(A) + 2n" 1 / 2 / 2 (A)A) 4 ) 1 / 4 . Suppose W = ReTr/(^) has finite 
fourth moment and let a 2 = Var(W). Let Z be a normal random variable 
with the same mean and variance as W . Then 

H (W 7\< S ^( ClC2a2 ^" . c l abn \ 

d T v(W,Z)<-^^ - 

If we now change the setup and assume that the entries of X are jointly 
gaussian with mean and nN x nN covariance matrix keeping all other 
notation the same, then the corresponding bound is 

8V5\\m 3/2 abn 



d TV (W,Z) < 



a 



2 N 3/2 
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Remarks, (i) As in the Wigner case, it is well known that under mild 
conditions, A = 0(1) as n, N — > oo with n/N — > c G [0,1). We refer to 
Section 2.2.2 in the survey article [4] for details. It follows that a and b 
are O(l). 

(ii) It is shown in [6] that in the case of independent entries, if n/N — ► 
c G (0, 1), then a 2 converges to a finite positive constant under fairly general 
conditions (an explicit formula for the limit is also available). Therefore 
under such conditions, the first bound above is of order 1/yN. 

(iii) We should remark that the spectrum of XX 1 is often studied by 
studying the block matrix 

( X\ 
\ X 1 J' 

because 

( ° x V _ ( xxt ° ^ 

\X l J ~ \ X l X )' 

Thus, in principle, we can derive Proposition 14.61 using the information con- 
tained in Lemma [4. 31 However, for expository purposes, we prefer carry out 
the explicit computations necessary for applying Theorem 13.11 without re- 
sorting to the above trick. The computations will also be helpful in dealing 
with the double Wishart case in the next subsection. 



Proof of Proposition 14.61 First, let us define the indexing set 
3 = {(P, q) ■ P = 1, • • • , n, q = 1, . . . , JV}. 

From now on, we simply write pq instead of (p,q). Let x = (x pq ) pq& j be a 
typical element of MP. In the following, the collection x is used as a matrix, 
and it seems that the only way to avoid confusion is to write X instead of x, 
so we do that. Generally, there is no harm in confusing this X with the 
collection of random variables defined at the onset. 

Let 7o, 71, and 72 be defined as in (J!|). For each m and i, let e m j be the 
i th coordinate vector in W 171 , i.e. the vector whose i th component is 1 and 
the rest are zero. Then 

dA 



dXpq 



x 1 (e np e t Nq X t + Xe Nq 



and 



(13) 



Q2A 1 t t t t 

— - = iV [e np e Nq eN s e nr + e nr e Ns eN q e np ) 

OXp q OXrs 

J N (e n pe nr + e nr e n p) if q = s, 

I otherwise. 
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Now take any n x n matrix B with \\B\\ = 1. Then for any p, q, 



( r 8A 



Tr B 



\ dx 



pq 



N- l \ Tr(5e np e^X* + BXe Nq e\ 



q^npJ 



N- l \e l N X l Be np + e* BXe Nq \ 



< 2N- 1 \\B\\\\X\ 



np 



This shows that 



70<2 A /-. 



Next, let a = (a pq )i< p < ni i< q <N, a' = (a£ pq )i<p< n ,i<q<N-, and/3 = (Pij)i<i,j<n 
be arbitrary matrices of complex numbers such that ||a||,f/s = Ha'H/fs = 
\hs = 1- Then 

n N n 



y~i a P<?/% 

n W n 



9a 



pq 



N 



N~ 



-i 



^ ^2 ^2 apqflijeltiienpelfgX* + Xe^g 

^np)^ 

p=l q =i i,j=l 
n N n n N n 



y y a pq&pj x n + y^ y y 

p=l g=l j'=l p=l q=l i=l 

= AT -1 |Tr(aX*^*) + Tr(aX*/3)|. 
By Lemma 14, 1^ we have 

| TV(a^) + Tr(aX'/?)| < 2||a|| H s||*|| = 2v / iVA. 
Thus, 

71 < 2 ' A 



Again, by the formula (|T3jl for second derivatives of A, 

/ ' j a pq a rsPk 
p,r=l q,s=l i,j=l 

n N n 



&Xp q dx rs 



AT 



N 



y ] y ] y ] a pq a rq0ij e ni( e np e nr e nr e n p) e 
p,r=l 5=1 i,j=l 

n N 



^ ] y ^ ^pq^rqif^pr Prp) 
p,r=l q=l 

N-^Txiaa' 1 ^) + Tr(aa''/?)| < 2JV- 1 ||a||tf S ||a , || J r S ||/?|| JfS . 



This shows that 
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Finally, note that rank(A) < n. Combining the bounds we get 

*ofn\[* / 9fm /^ a . 2/i(AVn 4/ 2 (A)A 

m < 2 /i( a )\/t7> vi < 2 /i( a )a/t7' and m < — 77 1- 



]V ' - JM 'V iV ' N N 

From this, we get 

Kl < h^ mi{X fx^'\ and 



N 

-2<^(E(/ 1 (A) + 2n- 1 / 2 / 2 (A)A) 4 )^. 
With the aid of Theorem 13. 1\ this completes the proof. □ 

4.5. Double Wishart matrices. Let n < N < M be three positive inte- 
gers, and let X = (^ij)i<i<n,i<j<iv be and Y = {Yij)i<i< n ,i<j<M be two 
collections of independent random variables in £(ci, c 2 ) for some finite ci, c 2 . 
Let 

^4 = XX^YY 1 )- 1 . 

A matrix like A is called a double Wishart matrix. Double Wishart matrices 
are very important in statistical theory of canonical correlations (see the 
discussion in Section 2.2 of |35j). 

If the matrices X and Y had independent standard gaussian entries, the 
matrix XX t {XX t + YY t )~ 1 would be known as a Jacobi matrix. In a recent 
preprint, Jiang [32] proves the CLT for the Jacobi ensemble. We have the 
following result. 

Proposition 4.7. Let \ x and \ y be the largest eigenvalues of N~ l XX l 
and M~ 1 YY t , and let 5 y be the smallest eigenvalue of M~ 1 YY t . Let A = 
max{l, X x , X y , Sy 1 }. Take any entire function f and define f\, fi as in 
Theorem^ Let a = (E(/i(A) 4 A 14 )) 1 / 4 and 

b = (E(4/x(A)A 5 + 2n- 1 /2 /2(A)A 7 ) 4 ) i/4 > 

Suppose W = ReTr/(yl) has finite fourth moment and let a 2 = Var(W). 
Let Z be a normal random variable with the same mean and variance as W . 

Then 

a nxr ^ ^ ±VWf Cl c 2 a 2 Ny/7i 2c\abyfNn\ 

d Tv^z)<—\-^ 2 — + M2 y 

Remarks, (i) Assume that n, N, and M grow to infinity at the same rate 
(we refer to this as the 'large dimensional limit'). From the results about 
the extreme eigenvalues of Wishart matrices ([4], Section 2.2.2), it is clear 
that A = O(l), and hence a, b are stochastically bounded. 

(ii) There are no rigorous results about the behavior of a 2 in the large 
dimensional limit, other than in the gaussian case, which has been settled 
in [32], where it is shown that a 2 converges to a finite limit. 
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(iii) When the entries of X and Y are jointly gaussian and some conditions 
on the dimensions are satisfied, the exact joint distribution of the eigenvalues 
of A is known (see |35j . Section 2.2 for references and an interesting story). 
While it may be possible to derive a CLT for the gaussian case using the 
explicit form of this density, it is hard to see how the non-gaussian case can 
be handled by either the method of moments or Stieltjes transforms. 

(iv) In principle, it seems something could be said using the second order 
freeness results of Mingo and Speicher [41]. However, to the best of our 
knowledge, an explicit CLT for double Wishart matrices has not been worked 
out using second order freeness. 



Proof of Proposition \A.7[ For convenience, let C = XX 1 and D = YY l . 
Note that ||C|| = ||X|| 2 = NX X , \\D\\ = \\Y\\ 2 = MX y and = l/(M5 y ). 

Let the other notation be as in the proof of Proposition 14.61 Now 



dA 

dx. 



(enpe^ X 1 + Xe Nq e: t )D \ 



PQ 



Again, using the formula 



dD~ 



we have 



dA 



dy pq " dy pq 
-CD-^enp^M-Y* + YeMqeiJD- 1 . 



pq 



Now take any n x n matrix B with ||B|| = 1. Then for any p, q, 



( dA \ 

Tr\B—j = | Tr(Be np e t Nq X t D- 1 + Ble^^" 1 ; 



Similarly, 



Tr( B 



< 



dA 



le^gX 1 D^Benp + e^D' 1 BXe Nq \ 
2||Zr 1 ||||X|| 



M 



V d VpQ 

Since A > 1 and N < M, 



< 2||CI|l|y||||D 



— 1 1 1 2 



< 



2X 7 / 2 N 
M 3 / 2 ' 



70 



< 



M 



Next, let a nxN , a nxM , Oi' nxN , a' nxM , and (3 nxn be arbitrary arrays of com- 
plex numbers such that Hallos + Hallos = 1, II^'Hhs 1 + H^'lllfS = 1> anc ^ 
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HS = 1- Then 

EEE a w^^r 

p=l 9 =1 ij=l 
n N n 

E E E 

p=i <? =1 » j'=i 

n N n n N n 

E E E aww*^ - % + E E E oto&jXiqiD- 1 ; 

P=l <?=1 i=l P=l 5=1 i j'=l 

^(a-X^ir 1 /?*) + Tr(L) _1 /3*Xa*)| 

2||a||//5A 3/2 ViV 



P.7 



<2||a||^||/3||^||X|||| J D- 1 ||< 



M 



Similarly, 

n M n 



da,; 



13 dy pq 



E E E ^w* 

p=l g=l jj=l 

71 M 71 

E E E ^faJniCB'KenptMqY* + Y ^Mq^ n p) D ~^ e nj 

p =l g=l i ,j = l 
71 M 71 x 

EE E (^^-^^(y^-^^+ap^-^-^^^- 1 ) 

p=l ? =1 i,-,=l \ 



<2||a||^||/3||^||y||||C|||| J D- 



- 1 M 2 



< 



2||a|| gg A 7 / 2 iV 



Combining, and using the inequality 



a 



hs + \\&\\hs < j2(\\a\\ 2 HS + \\a\\ 2 HS ) = y/2, 



we get 



7i < 



2V2\ ( ' 2 VN 



M 



Next, let us compute the second derivatives. First, note that 



8 2 A 



(finp&nr ^nr^np) ^ if Q — S, 



otherwise. 
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d 2 t 



i j 



Using Lemma 14.11 in the last step below, we get 

n N n 

Yl Yl a pq a rsP- 
p,r=l q,s=l i,j=l 

n N n 

^ ] y ] y ] a pq a rqPij e ni( e np&nr + e nr^ n p)^ e nj 
p,r=l q=l i,j=l 

n N n n N n 

12 J2J2 a P1 a rq^P^ D ' 1 ^j + 12 m2 a P1 a 'rqPrj{D~ l )pi, 
p,r=l q=l j=l p,r=l q=l j=l 

d-H. 



d 2 c 



< 



2\\a\\ H s\\ct'\\HS^ 
M 



< 2\\a\\Hs\\a \\hs\\P\\hs\, 
Thus, we have 

n N n 

( 14 ) 12 12 12 a Vd a rA 

p,r=l q,s=l i,j=l 

Next, note that 
d 2 A 

~k « — = -(enpe^qX 1 + Xe^e^D 1 (e nr e t Ms Y t + Ye M s4nr) D • 

OXpgOy rs 

When we open up the brackets in the above expression, we get four terms. 
Let us deal with the first term: 

n N M n 

12 121212 apqa^Pijeiiienpe^^D^enre^^D-^enj 

p,r=l q=l s=l i ,j=l 
n N M n 

p,r=l q=l s=l j=X 

= iTriaX'D^a'Y'D- 1 ^ < \\ a \\ HS \\tf\\ HS ??^. 

It can be similarly verified that the same bound holds for the other three 
terms as well. Combining, we get 

n N M n 



(15) 



d 2 CLi 



SEEE ap « a '™Pv dx pq dy rs 



p,r=l q=l s=l i,j=l 

Finally, note that 

d 2 A 



< M\oc\\hs\\oi'\\hs 



dy pq dy rs 



+ CD- l {e nr e t Ns Y t + Ye Ns e t nr )D- 1 {e np e t Nq Y t + Ye^e^D" 1 
CD (e np e nr + e nr e n p)D I{g =s }.. 
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Proceeding exactly as before, it is quite easy to get the following bound. It 
seems reasonable to omit the details. 



(16) 



n M n „o 



p,r=l <j,s=l « J=l 



„. A 5 iV 



Combining dHJ), (JTSJ), and ((TSJ, and noting that JV < M, A > 1, and the 
HS-norms of a, a', d, and d' are all bounded by 1, it is now easy to get that 

16A 5 

Finally, note that rank(A) < n. Combining everything we get 

^ 2/ 1 (A)A 7 / 2 v ^V 2V2/i(A)A 7 / 2 v^ 

% " M ' ^ " M ' 

, ^ 12A(A)A 5 ^ 8/ 2 (A)A 7 iV 
and m ~ M + M 2 ■ 

From this, we get 

^ 4iV\/2n 4 14 1/2 

< M2 (E(/i(A) A )) 7 , 

«i < ^^(E(/i(A) 4 A 14 ))^ ; and 

K2 ^ ^( E ( 4 /i( A ) A5 + 2n- 1 / 2 /2( A)A 7 ) 4 ) 1 /4. 
An application of Theorem 13.11 completes the proof. □ 

5. Proofs 

5.1. Proof of Theorem 12.21 The following basic lemma due to Charles 
Stein is our connection with Stein's method. For the reader's convenience, 
we reproduce the proof. 

Lemma 5.1. (Stein [50J, page 25) Let Z be a standard gaussian random 
variable. Then for any random variable W , we have 

drv(W,Z) <sup{\E^(W)W-^'(W))\ : l^'IU < 2}. 

Proof. Take any u : R — ► [— 1, 1] . It can be verified that the function 

<p(x) = e x2/2 [ X e- t2/2 (u(t) -Eu(Z))dt 



oo 
, oo 

2 



I e- t2/2 (u{t)-Eu(Z))dt 

J X 



is a solution to the equation 

tp (x) — x<p(x) = u{x) — Eu(Z). 
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Thus for each x, 

roo 

tp'(x) = u(x) - Eu(Z) - xe x2/2 / e- t2/2 (u(t) - Eu(Z))dt. 

J X 

It follows that 

(/•oo \ 
l + supxe x2/2 / e~ t2/2 dt 
x>0 Jx 

< 2sup\u(x) -Eu(Z)\ < 4. 

It can be verified that the same bound holds for sup a , <0 |¥>'(a0| by replacing 
x with —x. Therefore, we have 

\Eu(W) - Eu(Z)\ = \E(cp(W)W - ip'(W))\ 

< sup{\E(^(W)W - i>\W))\ : W\\oo < 4}. 

Since 

d TV (W,Z) = ^sup{\Eu(W) -Eu(Z)\ : < 1}, 

this completes the proof. □ 
The next lemma is for technical convenience. 

Lemma 5.2. It suffices to prove Theorem \2.2\ under the assumption that g, 
Vg and V 2 g are uniformly bounded. 

Proof. Suppose we have proved Theorem 12.21 under the said assumption. 
Take any g E C 2 (M"') such that a 2 is finite. Now, if any one among kq, ki, 
and K2 is infinite, there is nothing to prove. So let us assume that they are 
all finite. 

Let h : R + -> [0, 1] be a C°° function such that h(t) = 1 when t < 1 and 
h(t) = when t > 2. For each a > let 

9a(x) = g{x)h(aT l \\x\\). 

Clearly, as a — > oo, 

(17) d TV (g(X),g a (X)) < F(g(X) + g a (X)) < F(\\X\\ > a) -+ 0. 

Note that for any finite a, g a and its derivatives are uniformly bounded 
over R n . Now, since Eg(X) 2 is finite, < \g(x)\ for all x, and g a 

converges to g pointwise, the dominated convergence theorem gives 

lim Eg a {X) = Eg(X), and lim Eg a (X) 2 = Eg(X) 2 . 

Again, since Eg(X) 4 and kq, k\ and K2 are all finite, the same logic shows 
that 

lim Ki(g a ) = Ki(g) for i = 0, 1, 2. 

a— >oo 

These three steps combined show that if Theorem 12.21 holds for each g a , it 
must hold for g as well. This completes the proof. □ 

The following result is the main ingredient in the proof of Theorem 12.21 
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Lemma 5.3. Let Y = (Yi, . . . ,Y n ) be a vector of i.i.d. standard gaussian 
random variables. Let f : R™ — > M be an absolutely continuous function such 
that W = f(Y) has zero mean and unit variance. Assume that f and its 
derivatives have subexponential growth at infinity. Let Y' be an independent 
copy of Y and define the function T : W 1 — > R as 

rfe): =r^ E (gi fe) S ( ^ + ^ iy ' ) ) rft 

Let h(w) = E(T(Y)\W = w). Then Eh(W) = 1. If Z is standard gaussian, 
then 

d TV {W,Z) < 2E\h{W) - 1| < 2[Var(T(y))] 1/2 , 
where oItv is the total variation distance. 

Proof. Take any tp : R — ► R so that ip' exists and is bounded. Then we have 
Ety(W)W) = Ety(f(Y))f(Y) - i>(f(Y))f{Y')) 
= eQ^ W(Y))±f(y/iY + VT=tY')dt) 

Now fix t G (0, 1), and let U t = y/tY + y/T^tY', and V t = y/T^tY - y/tY'. 
Then Ut and T4 are independent standard gaussian random vectors and 
Y = \JtUt + y/1 — tVt- Taking any i, and using the integration-by-parts 
formula for the gaussian measure (in going from the second to the third line 
below), we get 

1 E(w(Viu t + VT=iv t ))Vt t M(Ut 



= iM W(y)) I (y) I (t/,) )' 

Note that we need the growth condition on the derivatives of / to carry out 
the interchange of expectation and integration and the integration-by-parts. 
From the above, we have 

Ety(w)w) = e(V (wo f Q ^ t i2 t^ (y) t^ y + ^~ tY ') dt ) 

= E(tjj'(W)T(Y)) = E(4>'(W)h(W)). 
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The assertion that E(h(W)) = 1 now follows by taking i/j(w) = w and using 
the hypothesis that E(W 2 ) = 1. Also, easily, we have the upper bound 

\E(ip(W)W - rp'(W))\ = \&(if/(W)(h(W) - 1))| 

< WW^ElhiW) - 1\. 

A simple application of Lemma 15.11 completes the proof. □ 

Theorem 12.21 follows from the above lemma if we bound Var(T(Y)) using 
the gaussian Poincare inequality, as we do below. 

Proof of Theorem \2.2\ First off, by Lemma 15.21 we can assume that g, Vg, 
and V 2 g are uniformly bounded and hence apply Lemma [5.31 without having 
to check for the growth conditions at infinity. Let Y\, . . . , Y n be independent 
standard gaussian random variables and (p\, . . . , <p n be functions such that 
Xi = (fi(Yi) and ||^||oo < Ci, H^'Hoo < c-2 for each i. Define a function 
if : R n -> W 1 as ip(yi, ...,y n ):= (<^i(yi), . . . , ¥n{y n )) and let 

f{y) := g((f(y))- 

Then W = g(X) = f(Y). It is not difficult to see, through centering and 
scaling, that it suffices to prove Theorem 12.21 under the assumptions that 
E(W) = and E(W 2 ) = 1 (this is where the a 2 appears in the error bound) . 
Now define T as in Lemma 15.31 



where Y' is an independent copy of Y . Our strategy for bounding Var(T) 
is to simply use the gaussian Poincare inequality: 

Var(T(Y)) < E||VT(Y)|| 2 . 

The boundedness of V 2 g ensures that we can move the derivative inside the 
integrals when differentiating T: 

dT 



-(y)=E 



Jo 2Vt^dyidyj dyj 



dyi J l^i^dyidy 



Now for each t G [0, 1], let Ut = \ftY + \/l — tY'. With several applications 
of Jensen's inequality and the inequality (a + b) 2 < 2a 2 + 26 2 , we get 



KIIVTPOI' <* (^(E^(>-)|(C.)) ! * 
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Now, we have 
Thus, if i ^= j, 

d 2 f d 2 g 



On the other hand, 



d 2 f d 2 g 



.^(y))^( yi ) 2 + ^L^(y))^( yi ). 



Thus, for any y,u £ M. n , 



i=i v i 



i=l v/=l ~ * J 3 

i=i * * 

<2cf||vV^))fl|V^(«))l| 2 + 2cfci^^^(y))^ : (^)) 

i=l ^ 4 4 

Let us now fix t £ [0, 1], replace y by Y and u by Ut and use the above 
inequality to bound the first integrand on the right hand side of (fT8j) . First, 
note that since Ut has the same law as Y, 

E(\\V 2 g (rtY))\\ 2 \\Vg(m))\\ 2 ) 

<(E||V 2 5 Mn)l| 4 ) 1/2 (E||V<7M^))l| 4 ) 1/2 
= (E||V 2 5 (JC)|| 4 )V2 (E ||v 5 (JC)|| 4 )V2 = K 2 K 2_ 

For the same reason, we also have 



1=1 

Combining, we get 

Since this does not depend on t, it is now easy to see that the first term 
on the right hand side is bounded by kc^K^K 2 , + ^ c i c 2 K o- I n a very similar 
manner, the second term can be bounded by cfn 2 K 2 + c 2 c?,Kq. Combining, 
and applying the inequality \J a + b < \fa + Vb, we finish the proof of first 
part of the theorem. 
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To prove the second part, let X = AY, where Y is a vector of independent 
standard gaussian random variables and A is a matrix such that £ = AA t . 
Define h : R n — > R as h(y) = g(Ay). It is easy to verify that 

||V%)|| < IISII^HV^)!! and ||V 2 %)||<||S||||V 2 5 (Ay)||. 

The rest is straightforward from the first part of the theorem applied to 
h(Y) instead of g(X), noting that for the standard gaussian distribution we 
have c\ = 1 and C2 = 0. □ 

5.2. Proof of Theorem 13.11 Let us begin with some bounds on matrix 
differentials. Inequality ([5]) from Lemma l4.1l is particularly useful. 

Lemma 5.4. Let A = (aij)i<ij< n be an arbitrary square matrix with com- 
plex entries. Let f(z) = Ylm=obmZ m be an entire function. Define two 
associated entire functions f± and f% as fi(z) = Ylm=i m \°m\z m ~ l and 
f2(z) = Y^m=2 m ( m ~ ^-)\°m\z m ~ 2 ■ Then for each i,j, we have 

3 Tr(f(A)) = (f(A)) ji . 



This gives the bounds 
d 



, 1r(/(A)) 



E 



daij 



< fi(\\A\\) for each i,j, and 

2 



< rank^/xdl^l 



\2 



Next, for each 1 < i, j, k,l < n, let 

hij, k l = ~ 9 l Tr(f(A)). 
daijdaki 

Let H be the n 2 x n 2 matrix {hij^i)i<i,j,k,i<n- Then < /2(||^4||). 

Proof. For each i, let a be the i th coordinate vector in M. n , i.e. the vector 
whose i th component is 1 and the rest are zero. Take any integer m > 1. A 
simple computation gives 

J- Ti(A m ) = Y Tr( A^A™-^ 1 ) =mTr( ^A m ^ 
da lj ^ V daij ) \9aij 

Thus, 

8 ( 8 A \ 

— Tr(/(A)) = Tc^—ftA)) = Tr(e ie */'(A)) = {f\A)) 3i . 



The first inequality follows from this, since \{f{A))ji\ < \\f'(A)\\ < fi(\\A\\). 
Next, recall that if B is a square matrix and r = rank(£?), then H-BH^s 1 < 
\/F||-B||. This holds because 

w b \\hs = 
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where Aj are the singular values of B, whereas \\B\\ = maxj |Aj|. Thus, if we 
let r = rank(A), then 



1,3 



d 
da 



oo 



2\ 1/2 oo 

J = \\f'(A)\\ HS <J2™\ b rn\\\A m - 1 \\ H s 

' m=l 



m,=l 



m=l 



This proves the first claim. Next, fix some m > 2. Another simple compu- 
tation shows that 



- 9 [ Tr(A m ) = mYTr( l^-.Y^-A 

da i:j da kl ^ V ^» 



r=0 
m-2 



7Ti 



£ Tr( ej e*^ efc e^ m - r - 2 ). 



r=0 



Now let £> = (bij)i<ij< n and C = (cj?)i<jj<n be arbitrary arrays of complex 
numbers such that Ylij \ b ij\ 2 = Ylij \ c ij\ 2 = 1- Using the above expression, 
we get 



9 2 



m-2 



'i j Cfci - 

dciijdau 
i,j,k,l 



Tr(A m ) = m ^ Tr(BA r CA m - r ~ 2 ). 



r=0 



Now, by Lemma |4.1|, it follows that 

| Tx{BA 1 CA 2 )\ < \\B\\ H s\\C\\hs\\Ai\\\\M\\ < \\A\\ m - 2 . 



Thus, 



Y b ij°ki h ij,ki < m(m- l)\b m \\\A\ 

i,j,k,l 



m-2 



m=2 



Since this holds for all B,C such that Ylij \bij\ 2 = Ylij \ c ij\ 2 = 1> the proof 
is done. □ 



Proof of Theorem 13.11 Let all notation be as in the statement of the the- 
orem. For any n x n matrix B, let ip(B) = Tr f(B). Define the map 
g : M? — > C as g = ip o A, that is, 

g(x) = Trf(A(x)). 

It is useful to recall the following basic fact for the subsequent computations: 
For any k and any vector x G C k , 

(19) \\x\\ = sup{\Y^iXiyi\ : y G C fc , ||y|| = l}. 
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Using this and the definition of 71 we get 



\\Vg(x)\\ = sup 



sup 



M6J i,j=l lJ 



dx u 



<7i(*)( £ 



9-0 
<9a 



2\ 1/2 



Now suppose f\ is defined as in Lemma 15.41 Applying the second bound 
from Lemma 15, 41 to the last term in the above expression, we get 



(20) ||Vs(x)|| < 7i(z)/i(||A(x)||)v'rank(A(s)) = m (x). 

Again note that for any u E 3, by Lemma 15.41 and the definition of 70, we 



have 



dx u 



(x) 



Tr(f(A) 



Thus, 
(21) 



£ 

Next, note that 

d 2 9 



< max 



dA 

dx u 

dg 



< 7o (x)f 1 (\\A(x)\\)= m (x). 



dx, 



(x) 



£ 



d! u {x] 



< rj (x) 2 rii{xy 



dxf^dxu doi'i'i 

1,3=1 j 

n 

+ E 

i,j,k,l=l 



d ^ (At W ^ a V 



dx 1^ dx u 



X 



daijdaki 



(A(x))^l(x)^(x). 



Thus, if V 2 g denotes the Hessian matrix of g, then 

, d 2 g 



|VVx) 



sup 



< sup 



dx^dxi 



E E aua' v — (A{x)) 

u,v£3 i,j=l 



' ^ (A(x)) ^ 1 
dan dxudxy 



+ sup 



E E a " a 

u,v£3 i,j,k,l=l 



, d 2 j) 

v dai-jdai 



■(A(x)) 



iijua M dx u dx v 

Now, by the definition of 72(2?) and Lemma 15.41 we have 



sup 



E £ «x^(A(x))^ ( ,) 



UjVG'J i,j=l 
n 



< 72 



(*)(£ 



(A(x)) 



dx dx 7; 



2 \ 1/2 



< 72 (x)/ 1 (||A(x)||)Vrank(A(x)). 
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For the second term, note that by the definition of the operator norm and 
the identity (fl9l) . 



sup 



u,u £J i,j,k,l=l 



< || W(^(x))|| sup £ 

a£K ■ . , cix u 



^ (A(x)) 9aij (x) dakl (x) 
ddijdaki dx u dx v 



dai 



\V 2 il)(A{x))\\ sup 



i,j=l u& 



'J 



da ij , 

<9x u 



Using the third bound from Lemma 15,41 and the definition of 71 (x) , we now 

get 



sup 



V V a a' ^ (A(x)) daij (x) dakl {x) 
~„. rrf , " v daijda k i dx u dx v 

</ 2 (P(x)||) 7 i(x) 2 . 
Combining the bounds obtained in the last two steps, we have 



(22) 



|V 2 5 (x)|| <7 2 (x)/i(P(x)||)v / rank(A(x))+7i 2 (x)/ 2 (||A(x) 



Finally, since g is defined on a real domain, therefore V Re g = Re Vg and 
V 2 Reg = ReVV Thus, ||VRes(x)|| < ||Vp(a;)|| and ||V 2 Re3(x)|| < 
\\V 2 g(x)\\. The proof is now completed by using (I20p . (|2ip . and (I22p to 
bound Ki, Ko, and k 2 in Theorem 12.21 The second part follows from the 
second part of Theorem I2.2L □ 
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